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Abstract:  Maintenance  practices  have  long  focused  on  time  based  “preventive 
maintenance”  techniques.  Components  were  changed  out  and  parts  replaced  based  on 
how  long  they  had  been  in  place  instead  of  what  condition  they  were  in.  A  reliability 
centered  maintenance  (RCM)  program  seeks  to  offer  equal  or  greater  reliability  at 
decreased  cost  by  insuring  only  applicable,  effective  maintenance  is  performed  and  by  in 
large  part  replacing  time  based  maintenance  with  condition  based  maintenance.  A 
significant  portion  of  this  program  involved  introducing  non-intrusive  technologies,  such 
as  vibration  analysis,  oil  analysis  and  I/R  cameras,  to  an  existing  labor  force  and 
management  team. 

This  paper  discusses  what  is  involved  in  an  RCM  program  and  how  EG&G  is 
implementing  it  at  Kennedy  Space  Center  on  the  facilities  maintenance  program.  It 
discusses  technical  tools,  management  tools  and  people  issues  involved  in  achieving  the 
goal  of  “better,  faster,  cheaper”  in  the  facilities  arena. 
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The  maintenance  program  is  an  integrated,  closed  loop,  continuous  improvement  process 
that  includes  life  cycle  maintenance  planning,  asset  risk  assessment,  runtime,  calendar  & 
condition  based  maintenance,  outage  coordination,  facility  condition  assessment  and  cost 
accounting.  The  maintenance  program  is  proactive  in  nature,  reliability  centered  and  is  a 
true  asset  management  program.  Program  effectiveness  is  measured  in  terms  of  asset 
availability,  reliability  and  life  cycle  cost. 

An  essential  element  in  the  program  is  the  computerized  maintenance  management 
system  (CMMS)  with  the  capability  to  interface  electronically  with  subject  matter 
specific  software  such  as  predictive  maintenance  software  programs  for  vibration 
analysis.  The  software  provides  the  traditional  productivity  and  maintenance  cost  reports 
as  well  as  asset  condition  and  maintenance  requirements  reports.  It  generates  work  orders 
based  on  asset  condition  triggers  and  time  based  or  usage  based  preplanned  frequencies. 
The  asset  inventory,  with  pertinent  data  including  risk  codes  and  RCM  analysis 
information,  is  contained  in  the  CMMS.  This  enables  Maintenance  Engineers  to  trend 
equipment  failures  for  further  analysis,  and  is  the  means  of  continually  improving  the 
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effectiveness  of  the  assigned  levels  of  maintenance  associated  with  an  asset  or  definable 
group  of  assets. 

Components  The  Maintenance  Program  is  a  closed  loop  process  that  ensures  continuous 
program  improvement.  The  first  functional  component  of  the  process  is  an  accurate 
inventory  of  assets  included  in  the  maintenance  program.  It  is  critical  to  know  what  is 
being  maintained  and  have  it  accurately  identified  in  the  CMMS. 


Life  Cycle  Planning  ensures  the  function  of  the  assets  is  clearly  defined,  understood  and 
documented  and  maintenance  requirements  are  planned  for  the  designed  life  of  the  asset. 
This  occurs  during  the  design  process  for  new  assets  and  is  documented  taking  into 
account  such  things  as  ease  of  access  to  components,  minimization  of  special  tooling, 
incorporation  of  data  for  predictive  maintenance  condition  trending,  etc.  Consideration  is 
also  given  to  the  expected  life  of  materials  specified  in  the  design  and  program 
maintenance  requirements  resulting  from  expiration  of  the  materials  useful  life  (i.e. 
repainting  structures  on  a  7-8  year  cycle,  replacing  roofing  systems  on  20  year  cycles, 
etc.).  The  more  routine  recurring  maintenance  including  preventive  tasks  (service, 
inspections  and  minor  repair)  and  predictive  testing  will  be  identified  utilizing  the  RCM 
methodology.  For  existing  assets,  this  takes  place  during  the  RCM  analysis. 


Once  the  asset  inventory  is  established  and  entered  into  the  computerized  information 
system  and  the  function  of  the  individual  or  defined  group  of  assets  is  clearly  understood 
and  documented  a  risk  assessment  is  performed.  The  risk  assessment  of  the  impact  of  a 
loss  of  function  of  the  asset  is  performed  to  determine  the  appropriate  asset  risk  category. 
Assets  fall  within  four  basic  risk  categories  (high,  medium,  low  or  negligible)  based  on 
the  lack  of  ability  to  support  mission  or  the  cost  involved  should  there  be  a  loss  of  asset 
function.  This  risk  assessment  is  the  first  step  in  developing  maintenance  requirements 
under  an  RCM  methodology. 

A  significant  component  of  the  program  in  terms  of  cost  effectiveness  is  the  methodology 
for  determining  maintenance  requirements.  The  RCM  philosophy  is  a  departure  from 
traditional  methods  of  determining  maintenance  requirements.  RCM  logically 
incorporates  the  most  effective  mix  of  reactive,  preventive,  predictive  and  proactive 
maintenance  practices  and  draws  on  their  respective  strengths.  RCM  applies  the  four 
maintenance  practices  where  each  is  most  appropriate  based  on  the  consequences  of 
failure  and  the  resulting  impact  to  mission.  This  combination  produces  optimum 
reliability  at  minimum  maintenance  cost  and  the  combined  benefits  far  exceed  those 
resulting  from  using  any  one  maintenance  practice.  RCM  incorporates  the  principle  that 
any  maintenance  task  performed  must  be  proven  to  be  applicable  and  effective. 
Applicable  implies  that,  of  the  competing  tasks,  the  selected  task  is  the  most  cost 
effective  option.  Effective  means  that  the  performance  of  the  task  will  prevent,  mitigate 
or  detect  the  onset  of  a  failure  or  discover  a  hidden  failure  that  has  already  occurred. 


During  an  RCM  analysis,  engineers  use  a  decision  logic  tree  to  assign  the  proper  mix  of 
maintenance.  Figure  1. 


Consider' each  of  the 

following  type  tasks. 

o  Servicing 

o  Inspection/test  to 
verify  operations  or 
detect  hidden  failure. 

o  Predictive  maintenance 
task  of  some  type  to 
detect  deterioration/ 
impending  failure 

o  Repair  to  reduce 
probability  of  failure 
occurrence. 

o  Item  replacement  to 
reduce  probability  of 
failure  occurrence. 
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Will  the  failure  result 
in  a  risk  Category  A 
incident? 
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should  prevent 

failure  occurence.  yes  no 


RCM  DECISION 
LOGIC  TREE 


HIGH  RISK 
-Loss  of  life/serious  injury 
•Loss  of  shuttle/payload 
-Loss/damage  of  a  shuttle  system 
or  major  assembly/component 


Wilt  the  failure  results 
in  a  risk  Cateogy  B 

Tasks(s)  selected  must  be  incident? 

appicable  and  costs  should  - — — - 

be  less  than  total  impact  costs.  yes  I  NO 


-Major  operational, 
environmental,  or 
political  consequences  ' 


Task(s)  selected  must  be 
applicable  and  costs  should 
be  less  than  total  impact  costs. 


Will  the  failure  result 
in  a  risk  Category  C 
incident? 


LOW  RISK 

-Minor  operational, 
environmental,  or 
political  consequences 


Is  one  or  a  combination 
of  PM  tasks  applicable 
and  cost  effective  for 
the  applicable  risk 
consequence? 


Implement 

Redesign  is 

taslc/tasks. 

required 

Failure  falls  into  risk 
Category  D.  No  PM 
is  recommended 

APPLICABLE  •  Task  will  prevent,  mitigate  »  detect  the  onset  of  a 
failure  or  discover  a  hidden  failure  that  has  already 
occurred.. 

EFFECTIVE  -  Among  competing  candidates,  the  selected  task  is  the 
most  cost  effective  option. 


-Inconvenience 


This  decision  logic  tree  focuses  on  sustaining  the  reliability  of  assets  in  support  of  a 
defined  mission.  The  RCM  analysis  is  structured  to  implement  the  principle  that  no 
maintenance  task  will  be  performed  unless  it  is  justified.  The  criteria  for  justification  are 
safety,  reliability  and  cost  effectiveness  in  deferring  or  preventing  a  specific  failure  mode. 
Because  RCM  is  reliability  based,  statistical  analysis  and  conditional  probabilities  of 
failure  are  important  in  determining  the  consequences  of  failure.  The  primary  objective  is 
to  maintain  the  inherent  reliability  designed  into  the  asset.  The  product  of  the  RCM 
analysis  is  work  procedures  for  both  preventive  and  predictive  maintenance  that  are 
captured  in  the  CMMS.  The  performance  schedule  is  also  generated  in  the  CMMS  as  a 
basis  for  initiating  preventive/predictive  maintenance. 


The  next  program  component,  Facility  Condition  Assessment  (FCA),  is  important  to 
maintenance  engineers  and  managers  as  it  provides  feedback  on  asset  maintenance 
effectiveness.  The  FCA  is  an  asset  inspection  and  engineering  analysis  of  maintenance 
history,  failure  trends,  any  root  cause  failure  analysis  that  might  have  been  performed  and 
any  open  or  planned  work  requirements.  The  purpose  of  the  FCA  is  to  validate 


maintenance  requirements  identified  during  life  cycle  planning,  review  and  revise  the 
effectiveness  of  the  assigned  mix  of  predictive,  preventive  and  reactive  maintenance, 
identify  any  new  asset  deficiencies  that  may  have  been  detected  during  the  assessment 
process,  review  planned  maintenance  work  and  review  energy  issues,  if  applicable. 


Another  important  part  of  the  FCA  is  validating  the  mission  of  the  asset.  Program 
requirements  changes  many  times  drive  asset  mission  changes.  When  mission  changes 
occur,  the  level  of  assigned  maintenance  may  require  adjustment  due  to  changes  in  asset 
criticality.  We  perform  FCAs  on  a  five  year  cycle  to  coincide  with  the  budget  cycle. 
Knowing  the  asset  mission,  the  asset  maintenance  history,  the  identified  and  planned 
maintenance  requirements  and  the  current  condition  of  the  asset,  work  can  be  prioritized 
and  programmed  for  performance  over  the  budget  cycle.  Existing  maintenance 
procedures  can  be  validated  and  adjusted  as  required,  monitoring  programs  implemented 
and  tests  conducted  on  assets  to  further  evaluate  any  suspected  problems.  The  FCA 
provides  a  structured  process  for  validating,  justifying  and  prioritizing  maintenance 
requirements. 


An  appropriate  level  of  maintenance  can  not  be  assigned  to  an  asset  unless  the 
consequences  of  failure  of  that  asset  are  clearly  understood.  RCM  forces  focus  on  the 
product  of  a  system,  rather  then  on  individual  items  within  a  system.  As  a  result,  many 
items  which  are  critical  to  a  system  operation  are  found  to  have  backups  or  work-arounds 
designed  into  the  system,  so  a  failure  or  loss  of  an  individual  item  does  not  result  in  a 
system  failure.  An  example  of  this  may  be  in  electric  power  distribution,  where  power  to 
a  specific  facility  is  critical.  The  loss  of  the  feeder  cable  will  result  in  no  power  through 
that  cable.  It  will  not  result  in  a  power  loss  to  the  facility,  however,  because  the  facility 
has  duel  power  feed  from  independent  circuits,  an  emergency  backup  generator  and  an 
UPS.  The  system  does  not  fail,  only  the  component. 


Risk  assessment  is  the  first  step  in  determining  maintenance  levels.  Four  risk  levels  have 
been  established,  based  on  the  consequences  of  failure;  high,  medium,  low  and  no  risk. 
High  and  medium  risk  codes  are  often  associated  with  catastrophic  failures,  but  because 
of  the  economic  impact  costs  smaller  failures  can  also  fall  into  this  area.  If  a  facility 
suffers  a  loss  of  utilities  and  has  no  secondary  feed  (either  onsite  or  portable),  the  people 
in  that  building  will  have  to  stop  work  and  leave.  This  "impact  cost",  different  from  a 
repair  cost,  while  not  obvious  to  maintainers  is  real  and  must  be  a  factor  in  evaluating  the 
risk  level.  The  RCM  analysis  is  structured  to  implement  the  principle  that  no 
maintenance  task  will  be  performed  unless  it  can  be  justified.  The  criteria  for 
justification  are  safety,  reliability,  and  cost  effectiveness  in  deferring  or  preventing  a 
specific  failure  mode.  Because  RCM  is  reliability  based,  statistical  analysis  and 
conditional  probabilities  of  failures  are  important  in  determining  the  consequences  of 
failure.  The  primary  objective  is  to  maintain  the  inherent  reliability  designed  into  the 
equipment.  Figure  2  graphically  ties  all  the  parts  together. 
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Figure  2 

In  this  model,  it  has  been  determined  during  the  design  that  the  facility  is  expected  to 
support  its  defined  function  for  100  years.  The  roof  of  the  facility,  given  the  climate  the 
facility  will  be  subjected  to,  will  require  major  refurbishment  after  approximately  20 
years  of  service.  Therefore,  as  part  of  the  life-cycle  plan,  a  major  refurbishment  is 
identified  20  years  from  date  of  facility  activation.  In  addition,  while  determining  the 
exterior  paint  specifications,  historical  data  and  engineering  studies  indicate  facilities 
require  repainting  every  five  years.  This,  too,  is  added  to  the  life-cycle  plan  as  identified 
program  level  maintenance  requirements.  Infrared  thermography  is  also  identified  on  a 
frequent  basis.  It  is  used  to  perform  a  condition  assessment  of  the  roof  and  the  electrical 
panels  throughout  the  facility  in  lieu  of  previously  assigned  labor  intensive  PM  tasks. 

Air  filters  are  replaced  on  a  regularly  scheduled  basis. 


Predictive  Maintenance,  also  known  as  Predictive  Testing  and  Inspection  (PT&I),  can 
determine  the  condition  of  the  equipment  and  provide  various  trend  indicators. 
Interpretation  of  these  indicators  allows  potential  functional  failure  to  be  forecast  so 
corrective  maintenance  can  be  performed  to  preclude  failure.  Working  as  a  complement 
to  the  PM  program  a  PT&I  program  can: 

o  Help  determine  the  condition  of  a  component  and  identify  required  repairs  before 
that  component  fails. 

o  Conserve  resources  by  performing  maintenance  on  an  as-required  basis  rather 
than  on  a  calendar  frequency  or  a  run-time  basis, 
o  Minimize  down  time. 

The  effectiveness  of  each  applicable  PT&I  test  is  examined  to  determine  which  test  or 
combination  of  tests  will  be  used.  Any  test  by  itself  may  not  give  a  good  representation 
of  the  overall  condition  of  each  piece  of  equipment  on  the  system.  However,  certain 
combinations  will  give  a  very  good  indication  of  equipment  condition.  Comparisons 
with  previous  tests  provides  trend  data  useful  in  condition  assessment  analysis. 

Historically,  the  focus  of  maintenance  has  been  the  Preventive  Maintenance  (PM) 
program.  Electrical  and  mechanical  equipment  experience  deterioration  over  time  that 
eventually  causes  it  to  fail.  PM  is  used  to  slow  this  deterioration,  ensuring  the 
equipment’s  operational  life.  A  properly  conducted  program  reduces  overall  operating 
costs,  aids  mission  effectiveness,  safety,  and  assures  the  continued  preservation, 
usefulness,  and  performance  of  assets.  The  PM  program,  coupled  with  the  other 
elements  of  the  overall  maintenance  program,  allows  engineers  to  be  aware  of  equipment 
condition  so  that  sufficient  time  is  available  for  the  systematic  planning  and  scheduling  of 
required  repair  work. 

Preventive  maintenance  consists  of  the  planned  and  scheduled  maintenance  tasks  that  are 
periodically  performed  on  equipment  to  avoid  a  breakdown.  The  frequency  is  based  on 
calendar  date,  rate  of  utilization  (routine),  or  condition  which  is  determined  by  trending 
data  collected  through  the  application  of  PT&I  technologies.  The  PM  program  consists  of 
the  following: 

o  Inspections  of  mechanical,  electrical  and  other  physical  structures,  installed 
equipment  and  systems  such  as  motors,  pumps,  compressors,  faucets,  light  switches,  etc. 

o  Inspections  are  performed  on  a  periodic,  pre-determined  basis  in  aneffort  to 
determine  the  degree  of  operating  efficiency  and  whether  equipment  deficiencies  exist. 

o  Routine  servicing  of  equipment  including  lubrication,  cleaning  and  changing 
filters,  minor  adjustments  and  parts  replacement,  and  condition  reporting. 

o  Formalized  evaluation  and  work  generation  system  which  ensures  discovered  , 
uncorrected  deficiencies  are  entered  into  the  normal  planning  and  scheduling  system. 


Run-to-failure  is  a  reactive  component  because  it  is  based  on  the  premise  that  no 
maintenance  task  that  improves  the  reliability  of  the  F/S/E  in  a  cost  effective  manner  has 
been  identified.  Users  call  a  trouble  desk  to  report  breakdowns  on  run-to-failure  items. 
When  the  corrective  action  required  is  beyond  the  scope  of  a  trouble  call,  if  engineering 
is  required,  or  if  material  must  be  ordered,  the  trouble  call  is  changed  to  a  repair  work 
order.  As  with  other  work  orders,  labor,  materials  and  material  costs  are  tracked  in  a 
CMMS.  This  information  is  then  sent  to  a  computer  history  file  which  can  be  retrieved 
later  for  use  in  F/S/E  condition  assessments,  making  repair/replace  decisions,  failure 
trending,  and  other  engineering  analysis. 


Maintenance  Effectiveness  The  effectiveness  of  the  maintenance  program  must  be 
measured  and  validated.  Long  term  effectiveness  is  monitored  through  the  facility 
condition  assessment  while  short  term  effectiveness  is  determined  using  failure  trending 
analysis,  which  highlights  failure  trends  on  like  equipment.  This  advanced  notice  gives 
time  to  take  action  to  prevent  catastrophic  failures. 

Failure  trending  codes  are  developed  by  maintenance  engineers  with  support  from  field 
technicians.  These  codes  are  used  by  the  technicians  in  the  field  to  track  and  classify 
failures  and  are  recorded  in  the  CMMS.  The  coding  structure,  coupled  with  existing 
report  filter  capabilities,  allows  a  relatively  quick  analysis  of  failure  data.  If  a  problem  is 
suspected,  a  more  detailed  analysis  is  performed.  Reports  provide  information  on  the 
following  elements:  1)  number  of  loss-of-function  events;  2)  cause  of  loss;  3)  disposition 
of  cause;  and  4)  corrective  action  taken. 


PROGRAM  MEASUREMENTS  (METRICS)  The  following  metrics  are  reported  to 
measure  the  progress  and  of  cost  effectiveness  of  the  maintenance  program. 


a.  Equipment  Availability 

%  =  Hours  System/Equipment  is  Available  to  Run  at  Capacity 
Total  Hours  During  the  Reporting  Time  Period 


b.  Maintenance  Overtime  Percentage 

%  =  Total  Maintenance  Overtime  Hours  During  Period 
Total  Regular  Maintenance  Hours  During  Period 


c.  Percent  of  Emergency  Work  to  Routine  Work 

%  =  Total  Emergency  Hours 
Total  Maintenance  Hours 


Millions 


Lessons  Learned  Many  of  the  lessons  we  learned  are  available  from  existing  texts,  both 
technical  and  management.  It  is  perhaps  inevitable  that  lessons  have  to  be  learned 
individually  in  order  to  be  understood,  and  so  many  of  the  lessons  presented  here  are  of 
an  obvious  nature.  By  far  our  biggest  finding  was  the  value  of  repetition.  By  definition, 
a  cycle  of  continuous  improvement  implies  doing  the  same  thing  over  and  over  and 
getting  a  bit  better  each  time.  Implementing  a  reliability  centered  maintenance  program 
involves  changing  the  way  people  think  and  work.  Training,  explanations,  briefings, 
analysis,  making  changes  and  tracking  results  were  done  on  an  individual  basis,  shop  by 
shop.  Selecting  a  visible,  intuitive  initial  technology  is  also  an  important  point.  Laser 
alignment  was  easily  demonstrated,  learned  and  understood;  vibration  monitoring  is  more 
involved  and  less  readily  grasped.  I/R  cameras  are  so  advanced  the  operation  is  simple; 
point  and  shoot  technology  allows  anyone  to  actually  see  the  temperature  difference 
between  a  loose  connection  and  a  proper  one.  As  we  were  able  to  show  results,  we  began 
to  build  a  cadre  of  supporters  who  functioned  as  champions  in  their  own  right. 

When  we  began  this  project,  we  went  through  a  developmental  phase,  an  implementation 
phase  and  are  now  in  an  operational  mode.  It  is  no  longer  a  phase  -  we  have  achieved  a 
shift  in  the  way  we  do  business.  The  very  nature  of  the  process  ensures  it  will  repeat 
itself  over  and  over  -  a  cycle  of  continuous  improvement.  This  program  is  not  something 
we  do  -  it  is  a  way  of  getting  things  done  in  an  efficient,  cost  effective  and  risk 
appropriate  manner. 
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